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ESTIMATION OF NONLINEAR MODELS WITH BERKSON 
MEASUREMENT ERRORS^ 

By Liqun Wang 

University of Manitoba 

This paper is concerned with general nonhnear regression models 
where the predictor variables are subject to Berkson-type measure- 
ment errors. The measurement errors are assumed to have a general 
parametric distribution, which is not necessarily normal. In addition, 
the distribution of the random error in the regression equation is 
nonparametric. A minimum distance estimator is proposed, which 
is based on the first two conditional moments of the response vari- 
able given the observed predictor variables. To overcome the possible 
computational difficulty of minimizing an objective function which in- 
volves multiple integrals, a simulation-based estimator is constructed. 
Consistency and asymptotic normality for both estimators are de- 
rived under fairly general regularity conditions. 

1. Introduction. In many scientific studies researchers are interested in 
the nonlinear relationship 

(1) Y = g{X;d)+£, 

where y G M is the response variable, X S is the predictor variable, 
9 €MP is the unknown regression parameter and e is the random error. In 
many experiments, it is too costly or impossible to measure the predictor X 
exactly. Instead, a proxy Z of X is measured. 

For example, an epidemiologist studies the severity of a lung disease, 
Y, among the residents in a city in relation to the amount of certain air 
pollutants, X. Assume the air pollutants are measured at certain observation 
stations in the city. The actual exposure of the residents to the pollutants X, 
however, may vary randomly from the values Z measured at these stations. 
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In this case, X can be expressed as Z plus a random error, which represents 
the individual variation in the exposure from the measured exposure. 

Other examples include agricultural or medical studies, where the rela- 
tions between the yield of a crop or the efficacy of a drug, Y , and the 
amount of a fertilizer or drug used, X, are studied. Suppose the fertilizer or 
the drug is applied at predetermined doses Z. The actual absorption of the 
fertilizer in the crop or the drug in the patients' blood, however, may vary 
randomly around the set doses, because of the local earth conditions or the 
individual biological conditions. In these cases, if the amount Z is properly 
calibrated, then the actual absorption X will vary around Z randomly, so 
that in average the random variation X — Z will be zero. 

In all situations mentioned above, a reasonable model for the measurement 
errors is the so-called Berkson model 

(2) X = Z + 5, 

where 6 is the unobserved random measurement error which is assumed to 
be independent of the observed predictor variable Z. More explanations and 
motivations of the Berkson-error model can be found in Fuller [(1987), pages 
79 and 80]. 

The stochastic structure of the Berkson measurement error model (2) is 
fundamentally different from the classical errors-in-variables model, where 
the measurement error is independent of X, but dependent on Z. This dis- 
tinctive feature leads to completely different procedures in parameter esti- 
mation and inference for the models. 

Estimation of the linear Berkson measurement error models is discussed in 
Fuller [(1987), pages 81-83] and Cheng and Van Ness [(1995), pages 35-38]. 
For nonlinear models, an approximative method called regression calibra- 
tion is presented by Carroll, Ruppert and Stefanski [(1995), Chapter 3]. 
Recently, Huwang and Huang (2000) studied a univariate polynomial model 
where g{x] 9) is a polynomial in x of a known order and showed that the 
least squares estimators based on the first two conditional moments of Y 
given Z are consistent. Wang (2003) considered general univariate nonlinear 
models where all random errors are normally distributed and showed that 
the minimum distance estimator based on the first two conditional moments 
of Y given Z is consistent and asymptotically normally distributed. 

In many practical applications, however, there is often more than one 
predictor variable which is subject to measurement errors. Moreover, the 
random errors e and 5 may have distributions other than the normal distri- 
bution. The goal of this paper is to generalize the results of Wang (2003) to 
the nonlinear models with multivariate predictor variables, where the mea- 
surement error 5 has a general parametric distribution fs{t]'i(}), £ ^ gM"^ , 
and the random error e has a nonparametric distribution with mean zero 
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and variance cr^. Thus, (1) and (2) represent a semiparametric model. Our 
main interest is to estimate parameters 7 = {9' , jp' , a'^)' . We show that the 
minimum distance estimator of Wang (2003) is still consistent and asymp- 
totically normally distributed. For the general model in this paper, however, 
a computational issue arises, because the objective function to be minimized 
involves multiple integrals for which explicit forms may not always be ob- 
tained. To overcome this difficulty, we propose a simulation-based estimator 
which is shown to be consistent and asymptotically normally distributed 
under regularity conditions similar to those for the minimum distance esti- 
mator. 

Throughout the paper we assume that Z, 6 and e are independent and Y 
has finite second moment. In addition, we adopt the common assumption in 
the literature that the measurement error is "nondifferential" in the sense 
that the conditional expectation of Y given X and Z is the same as the 
conditional expectation of Y given X. Although in this paper Z is assumed 
to be a random variable, it is easy to see that all results continue to hold 
if the observations of Z, Zi, Z2, ■ ■ ■ , Zn, are treated as fixed constants such 
that the limits limn-^ooJ2?=i^i/iT' and limn^QoJ2i'=i^i^'i/^ ^.re finite. 

The paper is organized as follows. In Section 2 we give three examples to 
motivate our estimation method. In Section 3 we formally define the mini- 
mum distance estimator and derive its consistency and asymptotic normality 
under some regularity conditions. In Section 4 we propose a simulation-based 
estimator and derive its consistency and asymptotic normality. Finally, con- 
clusion and discussion are given in Section 5, whereas proofs of the theorems 
are given in Section 6. 

2. Examples. To motivate our estimation method, let us consider some 
examples. To simplify notation, let us consider the case where the measure- 
ment error 6 = {61,62, ■ ■ ■ , S^)' has the normal distribution iV(0, cr'^Ik), where 
< cj| < cxD and is the /c-dimensional identity matrix. 

Example 1. First consider the model g{x;9) =61X1 + 6^e^^^'^, where 
^2^3 7^ 0. For this model the conditional moment of Y given Z can be written 
as 

E{Y\Z) = OiZi + eiE{6i) + 03e^2^2^(g9252) 

(3) 

= ipiZi + ip3e^'^\ 

where (fi = 9i, (f2 = ^2 and (p^ = 03e^2°"«/^. Similarly, the second conditional 
moment of Y given Z can be written as 

E{Y^\Z) = 9fE[{Zi + 6if\Z] + 0|E[e2''2{22+52)|^] 
+ 29i93E[{Zi + 6i)e^^^^^+^^^\Z]+E{e^) 
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(4) = ej{Zf + aj) + ey^^^Eie'''^'^) 

= ^4 + vlzl + 995e''^222 + 2v9i993^ie'^2^2^ 

where ip^ = Ola} + and (/^s = O^e^^^'^h Since (3) and (4) are the usual 
nonlinear regression equations and both Y and Z are observable, {(pi) are 
identified by these equations and, therefore, can be consistently estimated 
using the nonlinear least squares method. Furthermore, the original param- 
eters (0j,cj|,cj^) are identified because the mapping {6i,a'j,a^) i— > {(p-i) is 
bijective. Indeed, it is straightforward to calculate that 6i = ipi, 62 = ^P2-, 
6'3 = V53/\/^> 0-5 =log((/J5/'/3|)/v3i and al = ipA- fi\og{>p^/ipl)/ipl. 

Example 2. Now consider another model g{x;9) = 9iexp{x'92), where 
^1 7^ 0, 7^ ^2 £ and p> 1. For this model the first conditional moment 

of Y given Z can be written as 

=0ie^'^2^(g<5'92) 

where ipi = ^i exp(02^2<7|/2) and ip2 = 02- The second conditional moment 
is given by 

E{Y^\Z) = ele'^''^E{e^'''^) + E{e^) 

(6) 

where 933 = 0^e^^2^2o-| ^j^j _ Again, are identified by (5) and 

(6) and the nonlinear least squares method. Furthermore, the original pa- 
rameters (6*4, o"|, (jg) are identified because the mapping {9i,ag,a'^) 1— > {(fi) 
is bijective. Indeed, straightforward calculation shows that 9i = Lp\l ^Jlp^^ 
O2 = ¥^2, crj = \og{tp'i/ipl)/ip'2^2 and = ipi- 

Example 3. Further, let us consider the polynomial model g{x;9) = 
Oixi + 62x2 + 6*3X1 + ^4X2 + 6*5x1x2. For this model the first two conditional 
moments are, respectively, 

(7) E{Y\Z) = {03 + e^)al + OiZi + Q2Z2 + 6*3^? + 6*4^1 + 05^1^2 
and 

E{J'^\Z) = E[g'^{Z + 5-9)\Z] + E{e^\Z) 

(8) 

= E[g\Z + 5-MZ\ + cTl 

Again, it is easy to see that parameters {9i) and are identified by the 
nonlinear regression equation (7), whereas <t^ is identified by (8). Thus, all 
parameters in this model can be consistently estimated using the first two 
conditional moment equations. 
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The above examples suggest that in many situations, parameters in non- 
hnear models can be identified and, therefore, consistently estimated using 
the first two conditional moments of Y given Z. The fact that parameters of 
Berkson measurement error models may be identified in nonlinear regression 
was first noticed by Rudemo, Ruppert and Streibig (1989). The identifiabil- 
ity of the univariate polynomial model was shown by Huwang and Huang 
(2000). For general nonlinear models, it is worthwhile noting that even in 
the case where the mapping {9i,a'^,a'^) {ipi) is not bijective, the original 
parameters {9i,a'^,af) can still be identified, if appropriate restrictions on 
them are imposed. In the next section, we develop a minimum distance es- 
timator for the general nonlinear model (1) and (2) based on the first two 
conditional moments and derive its asymptotic properties. 

3. Minimum distance estimator. Under the assumptions for model (1) 
and (2), the first two conditional moments of Y given Z are respectively 
given by 

E{Y\Z) = E[g{Z + 5- e)\Z] + E{e\Z) 

(9) 

= j g{Z + t-e)h{t-i:)dt 

and 

E{Y^\Z) = E[g\Z + 6; e)\Z] + E{e^\Z) 

(10) . 

= J g\Z + t-9)fs{t;^)dt + al 

Throughout this paper, unless otherwise stated explicitly, all integrals are 
taken to be over the space M'^. Further, let 7 = {6',Tp',a'^y denote the vector 
of model parameters and letr = 0x^'xSc M^^'^"'"^ denote the parameter 
space. The true parameter value of model (1) and (2) is denoted by 70 E F. 
For every z ^M.^ and 7 G F, define 

(11) mi{z;-f)= J g{z + t;6)fs{t;iP)dt, 

(12) m2(z;7)= J g\z + t;e)fs{t;tlj) dt + al 

Then mi{Z;-fo)=E{Y\Z) and m2{Z;-fo) = E{Y'^\Z). 

Now suppose (Yi, Z-)', i = 1, 2, . . . , n, is an i.i.d. random sample and let 

p(Yi,Zi;j) = (Yi - mi{Zi;-/),Y-^ - m2{Zi;j)y . 

Then the minimum distance estimator (MDE) 7„ for 7 based on moment 
equations (9) and (10) is defined as 

% = argminQ„(7), 
7Gr 
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where the objective function 

n 

(13) Qn{l)=Y.p'iY^,Zi■n)W{Z,)p{Y,,Zi;^) 

1=1 

and W{Zi) is a 2 x 2 weighting matrix which may depend on Zj. 

Regularity conditions under which 7„ is identified, consistent and asymp- 
totically normally distributed are well known in the nonlinear regression 
literature; see, for example, Amemiya [(1985), Chapter 5], Gallant [(1987), 
Chapter 5] and Seber and Wild [(1989), Chapter 12]. Usually these condi- 
tions are expressed in a variety of forms. 

In the following, we adopt the setup of Amemiya (1985) and express 
these regularity conditions in terms of the regression function g{x;9) and 
measurement error distribution fs{t;ip). Let /i denote the Lebesgue measure 
and let || • || denote the Euclidean norm in W^. Then we assume the following 
conditions for the consistency of the MDE 7„. 

Assumption Al. g{x;6) is a measurable function of x for every 6 £Q, 
and is continuous in 9 £Q, for /i-almost all x. 

Assumption A2. fsit-^tj)) is continuous in G ^ for ^u-almost ah t. 

Assumption A3. The parameter space T C MP+'^+^ is compact. 

Assumption A4. The weight W{Z) is nonnegative definite with prob- 
ability 1 and satisfies E\\W {Z)\\ <oo. 

Assumption A5. / sup^ f5{t;Tp) dt <oo and £'(||VF(Z)|| + 1) x / supex-i- g 
t;e)fs(t-^)dt<<x. 

Assumption A6 . E[p{Y, Z- 7) - p{Y, Z- ^^)]'W{Z) [p{Y, Z- 7) - p{Y, Z; 70)] 
if and only if 7 = 70 . 

The above regularity conditions are common in the literature of nonlinear 
regression. In particular. Assumptions Al and A2 are usually used to ensure 
that the objective function Qn{l) is continuous in 7. Similarly, the compact- 
ness of the parameter space F is often assumed. From a practical point of 
view, Assumption A3 is not as restrictive as it seems to be, because for any 
given problem one usually has some information about the possible range 
of the parameters. Assumption A5 contains moment conditions which imply 
the uniform convergence of Qnil)- In view of (9) and (10), this assumption 
means that Y and E have finite fourth moments. It is easy to see that As- 
sumptions Al, A2 and A5 are satisfied, if g{x\6) is a polynomial in x and 
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the measurement error 5 has a normal distribution. Finahy, Assumption A6 
is the usual condition for identifiability of parameters, which means that the 
true parameter value 70 is the unique minimizer of the objective function 
Qnil) for large n. 

Theorem 1. Under Assumptions A1-A6, the MDE^n ^70? as 00. 



To derive the asymptotic distribution for the MDE 7„, we assume further 
regularity conditions as follows. 



Assumption A7. There exist open subsets 6*0 e 9o C 6 and -00 S C 
^, in which g{x] 0) is twice continuously differentiable with respect to 9 and 
fs{t;'ip) is twice continuously differentiable with respect to ip, for ^-almost 
all X and t, respectively. Furthermore, their first two derivatives satisfy 



E\\W{Z)\\ / sup 

00 X -I'D 

E\\W{Z)\\ f sup 



dg{Z + t-i 



do 

d^g{Z + t-P^ 2 



fs{t;ij)dt < 00, 



dedO' 
E\\W{Z)\\ f sup 



fs{t;i))dt < 00, 



E\\W{Z)\\ f sup g\Z + t;9) 
J Oox'I'o 



dip 



E\\WiZ)\\ I sup 
0ox*o 



dg{Z + t;9) 



89 



dip 
df5{t;iP) 



E\\WiZ)\\ f sup \giZ + t;9)\ 

J 0oX*o 

E\\W{Z)\\ f sup g\Z + t;9) 



dip 
d^f5it;iP) 



dip dip' 
d^f5{t;iP) 



Assumption A8. The matrix 

-dp'iY,Z;jo) 



B = E 

is nonsingular, where 

dp'iY,Z;^o) 
8-f 



W{Z) 



dip dip' 



dp{Y,Z;jo) 



dt < 00, 
dt < 00, 
dt < 00, 
dt < 00, 
dt < 00. 



dj 8"f' 

'dmi{Z;jo) 9?n2(Z;7o) 



57 



37 
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Again, Assumptions A7 and A8 are commonly seen regularity conditions 
which are sufficient for the asymptotic normality of the minimum distance 
estimators. Assumption A7 ensures that the first derivative of Qn{l) admits 
a first-order Taylor expansion and the second derivative of Qn{l) converges 
uniformly. This assumption and the dominated convergence theorem to- 
gether imply that the first derivatives dmi{z;'^)/d^ and dm2{z;j)/d^ exist 
and their elements are respectively given by 

dg{z + t; 



dmi{z; 


7) 


de 




dmi{z; 


7) 


dip 




dmi{z; 


7) 



de 



-fs{t;i^)dt, 



j 9{z + t;e)^—dt, 



and 



97712(2;; 7) 



de 




dm2{z; 


7) 


dip 




dm2{z; 


7) 


dal 



dg{z + t;e) 



de 



g{z + t-e)fs{t;iP)dt, 



J 9{z + t;e)^--dt 



1. 



Finally, Assumption A8 implies that the second derivative of Qn(7) has a 
nonsingular limiting matrix. Again, Assumptions A7 and A8 are satisfied 
for the polynomial model g{x; e) and the normal measurement error 6. 



Theorem 2. Under Assumptions A1-A8, as n ^ 00, \/n(7n — 70) 
N{0,B-^CB-^), where 



(14) C = E 
Furthermore, 



dp'iY, Z; 70) ^. ^. ^^^^^,fPiy, 70) 



d-f 



dy 



(15) 

and 
(16) 
where 



B = plim — 



n— »oo n 



i=l 



dp' {Yi,Zi;%)_^^^^dp{Yi,Zi; % 



4C = phm 

n— »oo n 



57 d^' 

1 dQniln) dQniln) 



O7 



dy 



O7 



i=l 



dj 
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The MDE 7„ depends on the weight W{Z). A natural question is how to 
choose W{Z) to obtain the most efficient estimator. To answer this question, 
we first note that, since dp' {Y^ Z;^q) / does not depend on y, matrix C 
in (14) can be written as 



C = E 



dp'(Y,Z:-fo) ..... ,dp(Y,Z:-fo) 
^ ^ ' ' '^^■W{Z)V{Z)W{Z)- ' ' 



where 



and has elements 



and 



d-f dy 

ViZ) = E[p{Y,Z;jo)p'{Y,Z;jo)\Z] 

vn=E[{Y-mi{Z;-fo)f\Z], 
V22 = E[{Y^-m2{Z;-fo)f\Z] 



vi2 = E[iY - mi{Z;-fo))iY^ - m2{Z;-fo))\Z]. 
Then, analogous to weighted (nonlinear) least squares estimation, we have 

B-^CB-^ > E 



dp'{Y,Z;jo\^^^^^,dp{Y,Z-jo) 



(in the sense that the difference is nonnegative definite), and the lower bound 
is attained for W{Z) = V{Z)~^ in B and C. The matrix V{Z) is invertible, 
if its determinant viif22 — Vi2 > 0- 

In general, V{Z) is unknown, and it must be estimated before the MDE 
jn using W{Z) = V{Z)~^ is computed. This can be done using the follow- 
ing two-stage procedure. First, minimize Qn{l) using the identity matrix 
W{Z) = I2 to obtain the first-stage estimator 7„. Second, estimate V{Z) by 



vii = -y](^i - mi{Zi;%)f, 
1=1 

1 " 

-Y^{Yl-m2{Z,-j^)f 



V22 



i=l 



and 



V12 = -y](^j - 'rni{Zi;%)){Y^ - m2{Zi;%)), 



i=l 



and then minimize Qn{l) again with W{Z) = V{Z)~^ to obtain the two- 
stage estimator 7^ . Since the estimators Vij are consistent for Vij , the asymp- 
totic covariance matrix of the two-stage estimator 7^ is the same as the right- 
hand side of (17) and, therefore, 7„ is asymptotically more efficient than 



10 



L. WANG 



the first-stage estimator 7„. More detailed discussions about the so-called 
feasible generalized least squares estimators can be found in, for example, 
Amemiya (1974) and Gallant [(1987), Chapter 5]. 

4. Simulation-based estimator. The MDE 7„ in the previous section is 
obtained by minimizing the objective function Qn{l) in (13). The computa- 
tion can be carried out using the usual numerical optimization procedures, 
if the explicit forms of mi{z]"f) and 7712(2; 7) can be obtained. For some 
regression functions g{x;9), however, explicit forms of the integrals in (11) 
and (12) may be difficult or impossible to derive. In this case, numerical 
integration techniques such as quadrature methods can be used. In practice, 
the numerical optimization of an objective function involving multiple inte- 
grals can be troublesome, especially when the dimension of the function is 
higher than three or four. To overcome this computational difficulty, in this 
section we consider a simulation-based approach for estimation in which the 
integrals are simulated by Monte Carlo methods such as importance sam- 
pling. This approach is similar to the method of simulated moments (MSM) 
of McFadden (1989) or Pakes and Pollard (1989). 

The simulation-based estimator can be constructed in the following way. 
First, choose a known density function (j){t) and, for each 1 < i <n, gen- 
erate an i.i.d. random sample {tis,s = 1,2,..., 25} from (j){t). Clearly, all 
samples {tis,s = 1,2, . . . , 25", i = 1, 2, . . . , n} form a sequence of i.i.d. random 
variables. Then mi(z;7) and m2{z]^) can be approximated by the Monte 
Carlo simulators as 



"ii,s(^i;7) 



1 ^ g{Zi + tis;9)fi{tis]'Ll)) 



"?-i,2S'(^i;7) 



■m2,s{Zi;^) 





Therefore, a simulated version of the objective function Qnil) can be defined 



as 



(18) 



where 
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pf^\l) = [Yi - mi,2s{Zi;-f),Y^ - m2,2s{Zi;-/)y . 

It is not difficult to see that Qn,s{l) approximates Qnin) as 5 — > oo, because 
by construction 

E[mi^s{Zi;'~i)\Zi] = E[mi^2s{Zi-n)\Zi] = mi{Zi]j) 

and 

E[m2,s{Zi;^)\Zi] = E[m2,2s{Zi;j)\Zi] =m2{Zi;j). 

In addition, Qn,s{l) is an unbiased simulator for Qn{l) in tbe sense that 

EQn,s{l) = EQn{l), because, given Yi,Zi, p] ^(7) and p] ^(7) are condi- 
tionally independent and hence 

E[pf^^)W{Z,)pf\^)]=E[E{f^p\^m,Z,)W{Z,)E{pf\^m^ 

= E[p{Yi,Z,-j)W{Zi)p{Yi,Z,-j)]. 
Finally, the simulation-based estimator (SE) for 7 is defined by 

7n,s = avgmmQn,s{l)- 

Note that, since Qn,s{l) does not involve integrals any more, it is continuous 
in, and differentiable with respect to, 7, as long as functions g{x] 9) and 
f5{t]ip) have these properties. In particular, the first derivative of pl (7) 
becomes 

dpf^'il) _ / ^ml,s{Z^■,l) drn2MZul)\ 
c?7 V ' ^7 / ' 

where dmi^s{Zi',^)/dj is the column vector with elements 
9mi,g(Zj;7) _ 1 dg{Zi + tis;9) f5{tis;i>) 

9mi,g(Zj;7) _ 1 ^ g{Zi + tis;6) OfsiUs^ip) 
dmi^siZi;-f) _ 
and dm2^siZi',^)/dj is the column vector with elements 

S 

dm2,s{Zi;"f) _ 2 dgjZj + tis;0) gjZj + tjs] 0)f5{tis]i') 

de ~ sf^^ de 
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dip 

dm2,siZi;-f) 



S 



<P{tis 



dip 



The derivatives dmi^2s{Zi;^)/dj and dm2^2s{Zi;^)/d^ can be given simi- 
larly. 

For the simulation-based estimator, we have the following results. 

Theorem 3. Suppose the support of (f)[t) covers the support of fs{t;ip) 
for all ip Then the simulation estimator ^n,s has the following proper- 

ties: 

P 

1. Under Assumptions A1-A6, 'yn,s ~^ 7o o,s n oo. 

2. Under Assumptions A1-A8, ^/n{'jn,s ~ lo) ~* ^i^jB~^CsB~^), where 



2Cs = E 



(19) 



+ E 
Furthermore, 



O7 

5pf^'(7o) 



(20) ^c^^^^^ldQ^,s{%,s)dQ^,s{%,s)_ 

n^oo n OJ 

In general, the simulation-based estimator 7„ 5 is less efficient than the 
MDE 7„ of the previous section, due to the simulation approximation of 
Pi{l) by p\ (7) and pl (7). A natural question is how much efficiency is 
lost due to simulation. The following corollary provides an answer to this 
question. 

Corollary 4. Under the conditions of Theorem 3, it holds that 



(21) 



1 



E 



dp[W{Z){pu - Pl) d{pu - piyW{Z)pi 
d{pii - pi)'W{Z){pi2 - Pl) d{pi2 - pi)'W{Z){pii - Pl) 



97 



where pi = piYi, .^1; 70) and 

Pis 



g{Zi + tis;Oo)fs{tis;ipo) ^2 9 iZi + tis;9o)fsitis;i^o) 2 
^ — ^ '-^i T7r-\ ^eO 



4>{tis) 

is the summand in p\^\'yo) = J2s=iPis/S- 



(pitis 
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The above corollary shows that the efficiency loss caused by simulation 
has a magnitude of 0{1/S). Therefore, the larger the simulation size 5, the 
smaller the efficiency loss. Furthermore, the efficiency loss reduces at rate 
0(l/5). Asymptotically, the importance density 4){t) has no effect on the 
efficiency of the estimator, as long as it satisfies the condition of Theorem 3. 
In practice, however, the choice of will affect the finite sample variances 
of the Monte Carlo estimators such as mi^si^ul)- Theoretically, the best 
choice of (f){t) is proportional to the absolute value of the integrand, which is 
\g{z + t; 9)fi{t; V")! for mi{z; 7). Practically, however, a density close to being 
proportional to the integrand is a good choice. For more detailed discussion 
about importance sampling and variance reduction methods for numerical 
integration, see, for example, Evans and Swartz [(2000), Chapter 6]. 

In light of Corollary 4, the discussion in the previous section about the 
optimal choice of the weight W{Z) = V{Z)~^ applies to the simulation-based 
estimator too, and will not be repeated here. 

5. Conclusion. We have considered general nonlinear regression models 
with Berkson measurement errors in predictor variables. The measurement 
errors are assumed to have a general parametric distribution which is not 
necessarily normal, whereas the distribution of the random error in the re- 
gression equation is nonparametric. We have proposed a minimum distance 
estimator based on the first two conditional moments of the response variable 
given the observed predictor variables. We have shown that this estimator 
is consistent and asymptotically normally distributed under fairly general 
regularity conditions. To overcome the computational difficulty which may 
arise in the case where the objective function involves multiple integrals, 
a simulation-based estimator has been constructed. The consistency and 
asymptotic normality for this estimator have also been derived under regu- 
larity conditions similar to those for the minimum distance estimator. The 
results obtained generalize those of Wang (2003), which deals with the uni- 
variate model under normal distributions. 

6. Proofs. 

6.1. Preliminary. First, for ease of reading we restate some existing re- 
sults which are used in the proofs. For this purpose, let X = {Xi,X2, . . . , Xn) 
be an i.i.d. random sample and let 7 be a vector of unknown parameters. 
Further, let H{Xi,^) and Sn{X,j) be measurable functions for any 7 G F, 
and be continuous in 7 € F for almost all possible values of X. In addition, 
the parameter space F C is compact. Using this notation, Theorems 4.2.1, 
4.1.1 and 4.1.5 of Amemiya (1985) can be stated as follows. 

Lemma 5. Suppose Esu'p^^j-\H{Xi,j)\ < 00. Then ^J27=iH{Xi,^) con- 
verges in probability to EH{Xi,^) uniformly in 7 G F. 
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Lemma 6. Suppose, as n — > co, Sn{X,j) converges in probability to a 
nonstochastic function S{'~f) uniformly m 7 G F, and 5(7) attains a unique 
minimum at 70 G T. Then the estimator 7„ = argmin^gp ^^(X, 7) converges 
in probability to 79. 

Lemma 7. Suppose, as n— >oo, 7) converges in probability to a 

nonstochastic function S{'y) uniformly in 7 in an open neighborhood of 'Jq, 
and 5(7) is continuous at 70. Then plim^^g^^ 7„ = 70 implies plim^^^^ Sn{X, 7„) = 



To simplify the notation in the proofs, we will denote p{Yi, Zi; 7) as Piij), 
and W{Zi) as Wi, as far as these cause no confusion. For any matrix A, 
its Euclidean norm is denoted as \\A\\ = ^ytiace{A'A), and vecA denotes 
the column vector consisting of the columns of A. Further, denotes the 
Kronecker product operator. 

Proof of Theorem 1. We show that Assumptions A1-A6 are suffi- 
cient for all conditions of Lemma 6. First, by Holder's inequality and As- 
sumption A5 we have 



for j = 1,2,3. It follows from Assumptions Al, A2 and the dominated con- 
vergence theorem that mi{z; 7), m2{z; 7) and therefore Qnil) are continuous 
in 7 G F. Let 



5(70). 



(22) 




Q{j) = Ep[{j)W{Z,)pi{j). 
Again by Holder's inequality, (22) and Assumption A3 we have 
£;||M^i||sup[yi-mi(Zi;7)]2 



r 



<2E\\Wi\\Y^ + 2E\\Wi\\supmj{Zi;j) 



r 




< 00 



and 



^||t^i||sup[y/-m2(Zi;7)]2 



r 




+ 3£'||l^i|| sup cr^ < 00 
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which imply 

Esupp[{-f)Wipi{j) <£'||Wi||sup||/>i(7)|p < oo. 
r r 

It follows from Lemma 5 that ^Qn{l) converges in probability to (5(7) 
uniformly in 7 G F. Further, since 

i^[p'i(7o)VFi(pi(7) -/^i(7o))] =i?[i?(p'i(7o)|^i)VFi(pi(7) -pi(7o))] 

= 0, 

we have 

Qil) = Q(7o) +i?[(pi(7) -pi(7o))Vi(pi(7) - Pi (70))]. 
It follows that (5(7) > (5(70) and, by Assumption A6, equality holds if and 

only if 7 = T 
follows. □ 



only if 7 = 7o. Thus all conditions of Lemma 6 hold and, therefore, 7^ 70 



Proof of Theorem 2. By Assumption A7 the first derivative dQni^) /dj 
exists and has a first-order Taylor expansion in a neighborhood Fq C T of 

70- Since dQn{'yn)/d^ = and 7n ^ 70 5 for sufficiently large n we have 

where ||7n — 7o|| ^ ||7n. — 7o|| ■ The first derivative of Qn{l) in (23) is given 
by 

^^ = 2E^^^M7), 

where 

^^^7) _ f dmi{Zi;-f) dm2iZi;j) 

V c^7 ' ^7 

and the first derivatives of mi(Zj;7) and m2{Zi]^) with respect to 7 are 
given in Assumption A8. Therefore, by the central limit theorem we have 

(24) ^^%Mi;Ar(o,4C), 

y/n 07 

where 



C = E 



9piilo)jjr I \ n Mj^%(7o) 



as is given in (14). The second derivative of Qn{l) in (23) is given by 



+ (ft(7)W^. ® Wi) 
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O7 O7' ' 97 d"f' 

Again, by Assumption A7, the nonzero elements in d'^mi{z;^)/d^ dj' are 
d'^mi{z;j) _ f d'^g{z + t;6) 



dOdO' 



89 d9' 



d^miiz;-/) fdg{z + t;e)dfs{t;tl;) 



86 dtp' 

8'^mi{z; 7) 
81/j 8tp' 



g{z + t 



89 8iIj' 



81^ 8ip 



-rdt, 



and the nonzero elements in 8'^m2{z;^) / 8^ 8^' are 



9^7712(2;; 7) 
8089' 



5^771,2(2;; 7) 
898x1)' 

5^7712(2;; 7) 
8ijj 8ip' 



d^9{z + t-9) 



g{z + t-9)fs{t-tP)dt 



+ 2 



8989' 

8g{z + t-e) 8g{z + t;0) 



86 



89' 



fs{t]'4))dt, 



:2/.(. + t;.)M£±M)^,, 



g\z + t;6) 



86 
8tP 8ip' 



8ip' 



dt. 



Analogously to the proof of Theorem 1, we can verify by Assumption A7 and 
Lemma 5 that (I/77) 8'^Qn{'y)/8'-f 8'y' converges in probability to 8^Q{'y)/8^ 8^' 
uniformly in 7 G Fq. Therefore, by Lemma 7 we have 



1 d^Qn{%) 



(25) 



n 8^ 8^' 
p 



2E 
2B, 



dp'i (70 ) „ . gpi(7o) , r V r., Ui/ ^ r \^ vec (a^; (70 )/8j) 



57 



8-f' 



8i 



where the second equality holds, because 

(9 vec(ap'i (70)797) 



E 



(pi(7o)W^i^/p+g+i)- 



8i 



0. 



{E{p'^{jo)\Zi)Wi^I, 



p+q+lj 



^8v ec{8p'^{-fo)/8-f) 
8i 
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It follows then from (23)-(25), Assumption A8 and the Slutsky theorem 

that V^(7„ -7o) ^ N{0,B-^CB-^). Finally, (15) and (16) can be similarly 
verified by Lemma 7. □ 



Proof of Theorem 3. The proof for part 1 of Theorem 3 is analogous 
to that for Theorem 1. First, Assumptions Al and A2 imply that Qn,s{l) 
is continuous in 7 S F. Then, by Lemma 5 we have, as n ^ 00, uniformly in 
7 G F that 



ig„,5(7) - E[pf^\^)W{Zr)pf\^)] 



= E[E{p^^^' {^)\Y,,Z,)W{Z,)E{pf\^)\Y,,Z,)] 
= E[p'^{-i)W{Z^)pi{^)] 

= Q{i)- 

p 

Finally, ^n,s ~^ 7o follows from Assumption A6 and Lemma 6. 

The proof of part 2 of Theorem 3 is analogous to that of Theorem 2. First, 
by Assumption A7 we have the first-order Taylor expansion of dQn.s{l)/d^ 
in a neighborhood Fq C F of 70, 



(26) 

wl: 
by 



0: 



+ 



-(7n,5-7o), 



57 (?7 97' 

where \\'^n,s — 7o|| < \\ln,s — 7o|| and the first derivative of Qn,s{l) is given 



E 

i=l 



w^pr'{i) + 



Since p\^\'y) has the same distribution as Pi^^\'y), all terms in the above 
summation are i.i.d. and have the common covariance matrix 



E 



+ E 
+ E 
+ E 



di 



dPi^^' (70) J J, (S) , . (2Sy ^pf^ (^0) 



^7 



d-f 
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It follows by the central limit theorem that, as n — > cx), 

1 dQn,s{lo) L 



(27) 



/n dj 

Now, the second derivative in (26) is given by 



E 

i=l 



+ (pr'(7)^.^Wi)'^''^''^'''^"^/'"^ 



+E 

i=l 



+ (/'f^'(7)^.®Wi) 



5y . 



where 



di 



and the nonzero elements in d'^'mi^s{Zi',"f)/d^ dj' are 



deoe' 



dOdO' 



(t>{Us) 



d^mi^s{Zi-y _ 1 j^dg{Z, + tis;e)dfs{tis;ij) 1 



S 



s=l 



de 



d^mi,s{Zi;j) _ 1 J^ g{Z,+tis;9)d^fs{Us;4^) 
dip dip' Sfr{ Htis) dip dip' ' 

and the nonzero elements in d'^m2,s{Zii7) /d^d^ are 



8989' 



is 



s=l 



d^gjZj + tis;9) gjZj + tis;9)fs{tis]ip) 
8989' <P{Us) 

8g{Zi + tis] 9) 8g{Z, + Us] 9) fs{tis;ip) 



+ 



89 



89' 



8''m2,s{Zi; 7) _ 2 A giZi + tis;9) 8g{Z, + U^, 9) 8fsitis;ij) 



89 8^ 



5§ 4>{Us) 



89 



8'p' 



8^m2,s{Zi;y _ l^g\Zi + tis;9)8^fs{Us;i^) 



8'p 8'p' 



S 



s=l 



P{Us) 8ip8iP' 
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Again, by Assumption A7 and Lemma 7, uniformly in 7 G F, 

d'^Qn,s{ln) P 



57 97' 



57 



+ (pS'""^' (70)^1 ®Wi) 

+ E 



dYec{dp[^^ (70)797) 



(28) 



d-f 



+ (pS''^'(7o)VFi0/p+,H 



5vec(9pf^-' (70)797) 



di 



E 



.(25), 



97 



57' 



57 



2E 



2B, 



dp['^'ho)^^dpf\jo) 



97 



di 



where the first equahty follows from 



E 



(pr(7o)M^i0wi) ^""^'^r>°^^'"^ 



57' 



E 



0. 



E{pf'^^'{i\Z,)Wi(S)I, 



p+q+l- 



dvec{dp[^^ (7)797) 



di 



and the last equality holds because 



E 



9pf)'(7o)^^9pf^^(7o) 



dj 



di 



E 
E 



E 



dpi'hlo) 



dj 



Zi ]WiE 



dp?'Hlo) 



di 



9/''i(7o)p^^9pi(7o) 



d-f 



di 



By (26)-(28) and the Slutsky theorem, we have y/n{%^s-lo) ^ N{0,B^^CsB~^). 
Finally, (20) can be similarly shown by Lemma 7. □ 

Proof of Corollary 4. To simphfy notation, in the following we 

(S) (S) 

denote pi = Pii-fo) and, correspondingly, pi = p- (70)- Then the common 
term of dQn,s{lo) I dl in (26) can be written as 

d-f 
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T1 + T2 + n 



where 



and 



(?7 



(s) 

Since pl and pl^^' are conditionally independent given Yi and Zi, Ti, T2 
and T3 are mutually uncorrelated and hence 

(29) E{TT') = E{TiT[) + ^(TsT^) + E{T^T;^), 

where E{TT') = ACs and E{TiT[) = AC. Furthermore, since p\^^ and p! 
have the same distribution, 

-dp[Wi{pf^ -p,)d{pf^ -p^ywwi 



J25) 



(2S) 



E{T2T^) = 2E 



57 



57' 



Now write p] '(70) = ^ Es=i As, where 



1 v^5 



P^ 



g{Zi + tis]9o)fs{tisVM „2 _ 9'^{Zi + tis;Oo)f5{tis;^o) _2 ^ ' 



Then, since pis, s = 1, 2, . . . , 5, are independent given Yi, Zi, we have 



(30) 



^-E 

S 



dp'iWi{pis - Pi) d{pis - Pi)'WiPi 

h 57 it ^7' 

OpiVFi(pii - Pl) d{pii - pij'Wipi 



In the same way, we can show that 



(31) 



d{pii - pi)'W{Z){pi2 - Pl) d{pi2 - pi)W(Z)(pn - Pl) 



97 



di 



The corollary follows from (29)-(31). □ 
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